Skip to content

Add Rapid bucket template cache#2876

Draft
ValentaTomas wants to merge 14 commits into
mainfrom
valenta/rapid-bucket-cache
Draft

Add Rapid bucket template cache#2876
ValentaTomas wants to merge 14 commits into
mainfrom
valenta/rapid-bucket-cache

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 30, 2026

Adds a disabled-by-default Rapid bucket read-through cache for template artifacts. Rapid cache hits/writes update Redis asynchronously for chunk-level cleanup hints, while reads still use deterministic bucket paths and fall back to canonical storage.

Tested: go test ./packages/shared/pkg/storage ./packages/orchestrator/cmd/clean-rapid-cache

Add a feature-gated bucket-backed read-through cache for template artifacts as an alternative to the shared NFS cache.
@cla-bot cla-bot Bot added the cla-signed label May 30, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented May 30, 2026

PR Summary

Medium Risk
Changes template I/O path selection and introduces async cache writeback plus Redis/GCS state; misconfiguration or index drift could affect latency or cleanup, but the flag defaults off and failures fall back to canonical storage.

Overview
This PR adds an optional Rapid bucket read-through cache for template artifact reads, gated by use-rapid-bucket-cache (default off) and RAPID_BUCKET_CACHE_BUCKET_NAME. When enabled on non-build template loads, reads try deterministic rapid-cache/ objects in a dedicated GCS bucket (zonal client + finalize-on-close writes) before falling back to canonical storage, with async population on miss and the same skip-writeback context behavior as the NFS cache. A Redis-backed index records chunk access and sizes for eviction/cleanup hints when Redis and the bucket are configured; otherwise the index is a no-op. Rapid cache takes precedence over the existing NFS template cache path when both could apply.

Reviewed by Cursor Bugbot for commit 334818f. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

In packages/shared/pkg/storage/storage_cache_rapid.go, calling c.wg.Go will cause a compilation error because sync.WaitGroup does not have a Go method. The background cache write-back goroutines do not need to be synchronized or waited on, so the function should be run directly in a goroutine and the unused wg field removed from the rapidCachedSeekable struct.

Comment thread packages/shared/pkg/storage/storage_cache_rapid.go
@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
2707 3 2704 5
View the full list of 3 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSandboxListPaginationRunningLargerLimit

Flake rate in main: 42.64% (Passed 764 times, Failed 568 times)

Stack Traces | 93.2s run time
=== RUN   TestSandboxListPaginationRunningLargerLimit
    sandbox_list_test.go:327: Created sandbox 1/12: ik89ypblbcikec1zzd9v6
    sandbox_list_test.go:327: Created sandbox 2/12: i3cjp9waelphlmlcairy5
    sandbox_list_test.go:327: Created sandbox 3/12: ifad0n3a3k6h6fq4u287l
    sandbox_list_test.go:327: Created sandbox 4/12: icyhps1dyo1si4dsc90kr
    sandbox_list_test.go:327: Created sandbox 5/12: iwfja083caejf28hccy93
    sandbox_list_test.go:327: Created sandbox 6/12: idyjabw32i79hhnui1tuh
    sandbox_list_test.go:327: Created sandbox 7/12: i9jq333p4ovi59xdkqgvo
    sandbox_list_test.go:327: Created sandbox 8/12: iz84sgud42pi98d76qynp
    sandbox_list_test.go:327: Created sandbox 9/12: ib5z6qftaajwv3fgqax2r
    sandbox_list_test.go:327: Created sandbox 10/12: i9uj89mo19eq2gnk7hu2l
    sandbox_list_test.go:327: Created sandbox 11/12: iqn68pipa5v51y54inkw2
    sandbox_list_test.go:327: Created sandbox 12/12: ihfcu7xx1433hyelrymfk
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:340
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	"[]" should have 12 item(s), but has 0
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:330
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxListPaginationRunningLargerLimit
--- FAIL: TestSandboxListPaginationRunningLargerLimit (93.22s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 57.45% (Passed 757 times, Failed 1022 times)

Stack Traces | 64.2s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (64.17s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 57.56% (Passed 747 times, Failed 1013 times)

Stack Traces | 198s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1254}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 191 MB\nFree memory before tmpfs mount: 793 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Memory to use in integrity test (60% of free, min 64MB): 475 MB\n"}}
Executing command bash in sandbox iszh7obkyqmgj0zpt5f8g (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"475+0 records in\n475+0 records out\n498073600 bytes (498 MB, 475 MiB) copied, 1.96548 s, 253 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=475\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 1.92\n\tPercent of CPU this job got: 97%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:01.97\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2604\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 3\n\tMinor (reclaiming a frame) page faults: 341\n\tVoluntary context switches: 4\n\tInvoluntary context switches: 11\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 670 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox igrxp4rnrknm8fxpfpr4s
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1270}}
Executing command bash in sandbox imcw7shg1qwzr9s873h7r (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"4170d7af170bafe01b6f094ae38233e7546f2967534575498076b290805412a0\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox igrxp4rnrknm8fxpfpr4s
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1273}}
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
Executing command bash in sandbox igrxp4rnrknm8fxpfpr4s (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox igrxp4rnrknm8fxpfpr4s: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (197.94s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Rapid cache not invalidated on delete
    • Added async deletion of rapid-cache/ prefixed objects in DeleteObjectsWithPrefix to match NFS cache cleanup behavior.
  • ✅ Resolved by another fix: Rapid cache orphaned on template delete
    • This bug is resolved by the same fix applied to bug 1c5b6c3a which adds cache invalidation to DeleteObjectsWithPrefix.
  • ✅ Fixed: Compressed rapid hits skip size check
    • Added size validation in openCompressed to check cached object size matches expected frame length, with automatic invalidation of mismatched entries.

Create PR

Or push these changes by commenting:

@cursor push ee088a1d9b
Preview (ee088a1d9b)
diff --git a/packages/shared/pkg/storage/storage_cache_rapid.go b/packages/shared/pkg/storage/storage_cache_rapid.go
--- a/packages/shared/pkg/storage/storage_cache_rapid.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid.go
@@ -22,6 +22,12 @@
 }
 
 func (p *rapidCacheProvider) DeleteObjectsWithPrefix(ctx context.Context, prefix string) error {
+	// Delete from cache bucket asynchronously (cached objects are under "rapid-cache/" prefix)
+	go func(ctx context.Context) {
+		cachePrefix := "rapid-cache/" + prefix
+		_ = p.cache.DeleteObjectsWithPrefix(ctx, cachePrefix)
+	}(context.WithoutCancel(ctx))
+
 	return p.inner.DeleteObjectsWithPrefix(ctx, prefix)
 }
 
@@ -88,14 +94,26 @@
 
 	cachePath := makeFrameFilename(c.path, r)
 	if raw, err := c.openCache(ctx, cachePath, int64(r.Length)); err == nil {
-		dec, err := newDecompressingReadCloser(raw, frameTable.CompressionType())
-		if err != nil {
-			raw.Close()
+		// Validate cached object size matches expected frame length
+		size, sizeErr := c.getCacheSize(ctx, cachePath)
+		if sizeErr == nil && size == int64(r.Length) {
+			dec, err := newDecompressingReadCloser(raw, frameTable.CompressionType())
+			if err != nil {
+				raw.Close()
 
-			return nil, err
+				return nil, err
+			}
+
+			return dec, nil
 		}
-
-		return dec, nil
+		// Size mismatch or size check failed: close reader and delete bad cache entry
+		raw.Close()
+		if sizeErr == nil {
+			// Confirmed size mismatch: delete asynchronously to allow refetch
+			go func(ctx context.Context) {
+				_ = c.cache.DeleteObjectsWithPrefix(ctx, cachePath)
+			}(context.WithoutCancel(ctx))
+		}
 	}
 
 	raw, err := c.inner.OpenRangeReader(ctx, r.Offset, int64(r.Length), nil)
@@ -130,6 +148,15 @@
 	return obj.OpenRangeReader(ctx, 0, length, nil)
 }
 
+func (c *rapidCachedSeekable) getCacheSize(ctx context.Context, path string) (int64, error) {
+	obj, err := c.cache.OpenSeekable(ctx, path, UnknownSeekableObjectType)
+	if err != nil {
+		return 0, err
+	}
+
+	return obj.Size(ctx)
+}
+
 func (c *rapidCachedSeekable) writeCache(ctx context.Context, path string, data []byte) error {
 	blob, err := c.cache.OpenBlob(ctx, path, UnknownObjectType)
 	if err != nil {

You can send follow-ups to the cloud agent here.

Comment thread packages/shared/pkg/storage/storage_cache_rapid.go
Comment thread packages/orchestrator/pkg/sandbox/template/cache.go
Comment thread packages/shared/pkg/storage/storage_cache_rapid.go
Validate cached object sizes, clean cache prefixes with canonical deletes, and use plain goroutines for best-effort writeback.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Rapid cache skips read validation
    • Added validateReadParams method to rapidCachedSeekable to ensure offset alignment and prevent cache key collisions from unaligned reads.
  • ✅ Fixed: Cache delete blocks primary delete
    • Changed DeleteObjectsWithPrefix to ignore cache deletion failures by discarding the error and always proceeding to delete from primary storage.

Create PR

Or push these changes by commenting:

@cursor push 65a2bd76f7
Preview (65a2bd76f7)
diff --git a/packages/shared/pkg/storage/storage_cache_rapid.go b/packages/shared/pkg/storage/storage_cache_rapid.go
--- a/packages/shared/pkg/storage/storage_cache_rapid.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid.go
@@ -21,9 +21,7 @@
 }
 
 func (p *rapidCacheProvider) DeleteObjectsWithPrefix(ctx context.Context, prefix string) error {
-	if err := p.cache.DeleteObjectsWithPrefix(ctx, "rapid-cache/"+prefix); err != nil {
-		return err
-	}
+	_ = p.cache.DeleteObjectsWithPrefix(ctx, "rapid-cache/"+prefix)
 
 	return p.inner.DeleteObjectsWithPrefix(ctx, prefix)
 }
@@ -64,6 +62,10 @@
 		return c.openCompressed(ctx, off, frameTable)
 	}
 
+	if err := c.validateReadParams(length, off); err != nil {
+		return nil, err
+	}
+
 	cachePath := fmt.Sprintf("%s/%012d-%d.bin", c.path, off/MemoryChunkSize, length)
 	if rc, err := c.openCache(ctx, cachePath, length); err == nil {
 		return rc, nil
@@ -114,6 +116,23 @@
 	return dec, nil
 }
 
+func (c *rapidCachedSeekable) validateReadParams(buffSize, offset int64) error {
+	if buffSize == 0 {
+		return ErrBufferTooSmall
+	}
+	if buffSize > MemoryChunkSize {
+		return ErrBufferTooLarge
+	}
+	if offset%MemoryChunkSize != 0 {
+		return ErrOffsetUnaligned
+	}
+	if (offset%MemoryChunkSize)+buffSize > MemoryChunkSize {
+		return ErrMultipleChunks
+	}
+
+	return nil
+}
+
 func (c *rapidCachedSeekable) Size(ctx context.Context) (int64, error) {
 	return c.inner.Size(ctx)
 }

You can send follow-ups to the cloud agent here.

Comment thread packages/shared/pkg/storage/storage_cache_rapid.go
Comment thread packages/shared/pkg/storage/storage_cache_rapid.go Outdated
@ValentaTomas ValentaTomas requested a review from jakubno May 31, 2026 02:18
Validate uncompressed rapid cache reads and avoid blocking canonical deletes on cache cleanup failures.
Add a disabled-by-default periodic GCS cleaner for rapid-cache objects so the cache can be bounded without relying on bucket lifecycle rules.
Make the rapid cache cleaner explicit about ignoring client close errors.
Satisfy orchestrator lint formatting for the dry-run branch.
Record Rapid cache chunk recency asynchronously so cleanup can evict cold chunks while reads still use deterministic bucket paths.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Cleanup deletes recently used cache
    • Added IsRecent check during bucket scan to skip objects that exist in Redis index, preventing deletion of hot cache entries whose GCS Updated timestamp is old but Redis score is recent.
  • ✅ Fixed: Successful index pass skips bucket scan
    • Modified clean function to always run both cleanFromIndex and cleanFromBucket passes, ensuring orphaned objects (in GCS but not in Redis) are discovered and removed even when the index pass deletes objects.
  • ✅ Fixed: Build delete leaves Redis index
    • Added EvictPrefix method to RapidCacheIndex interface and implemented it using Redis ZSCAN to bulk-remove matching entries from chunks ZSET and decrement build counters when DeleteObjectsWithPrefix is called.

Create PR

Or push these changes by commenting:

@cursor push 25445707df
Preview (25445707df)
diff --git a/packages/orchestrator/cmd/clean-rapid-cache/main.go b/packages/orchestrator/cmd/clean-rapid-cache/main.go
--- a/packages/orchestrator/cmd/clean-rapid-cache/main.go
+++ b/packages/orchestrator/cmd/clean-rapid-cache/main.go
@@ -71,15 +71,21 @@
 		_ = client.Close()
 	}()
 
-	if deleted, err := cleanFromIndex(ctx, client, bucket, cutoff, maxDeletions, dryRun, index); err != nil {
+	deletedIndex, err := cleanFromIndex(ctx, client, bucket, cutoff, maxDeletions, dryRun, index)
+	if err != nil {
 		return err
-	} else if deleted > 0 {
-		log.Printf("summary dry_run=%t deleted=%d source=redis", dryRun, deleted)
+	}
 
+	if deletedIndex > 0 {
+		log.Printf("summary dry_run=%t deleted=%d source=redis", dryRun, deletedIndex)
+	}
+
+	remaining := maxDeletions - deletedIndex
+	if remaining <= 0 {
 		return nil
 	}
 
-	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, maxDeletions, dryRun)
+	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, remaining, dryRun, index)
 }
 
 func cleanFromIndex(ctx context.Context, client *gcs.Client, bucket string, cutoff time.Time, maxDeletions int, dryRun bool, index storage.RapidCacheIndex) (int, error) {
@@ -117,7 +123,7 @@
 	return deleted, nil
 }
 
-func cleanFromBucket(ctx context.Context, client *gcs.Client, bucket string, prefix string, cutoff time.Time, maxDeletions int, dryRun bool) error {
+func cleanFromBucket(ctx context.Context, client *gcs.Client, bucket string, prefix string, cutoff time.Time, maxDeletions int, dryRun bool, index storage.RapidCacheIndex) error {
 	var scanned, matched, deleted int
 	objects := client.Bucket(bucket).Objects(ctx, &gcs.Query{Prefix: prefix})
 	for {
@@ -133,6 +139,13 @@
 		if !attrs.Updated.Before(cutoff) {
 			continue
 		}
+
+		if !dryRun {
+			if recent, err := index.IsRecent(ctx, attrs.Name); err == nil && recent {
+				continue
+			}
+		}
+
 		matched++
 		if deleted >= maxDeletions {
 			break
@@ -145,6 +158,7 @@
 		if err := client.Bucket(bucket).Object(attrs.Name).Delete(ctx); err != nil {
 			return fmt.Errorf("delete cache object: %w", err)
 		}
+		_ = index.Evict(ctx, attrs.Name, attrs.Size)
 		deleted++
 	}
 

diff --git a/packages/shared/pkg/storage/storage_cache_rapid.go b/packages/shared/pkg/storage/storage_cache_rapid.go
--- a/packages/shared/pkg/storage/storage_cache_rapid.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid.go
@@ -27,6 +27,7 @@
 }
 
 func (p *rapidCacheProvider) DeleteObjectsWithPrefix(ctx context.Context, prefix string) error {
+	_ = p.index.EvictPrefix(ctx, prefix)
 	_ = p.cache.DeleteObjectsWithPrefix(ctx, "rapid-cache/"+prefix)
 
 	return p.inner.DeleteObjectsWithPrefix(ctx, prefix)

diff --git a/packages/shared/pkg/storage/storage_cache_rapid_index.go b/packages/shared/pkg/storage/storage_cache_rapid_index.go
--- a/packages/shared/pkg/storage/storage_cache_rapid_index.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid_index.go
@@ -16,6 +16,8 @@
 	Admit(ctx context.Context, path string, size int64) error
 	Evict(ctx context.Context, path string, size int64) error
 	Candidates(ctx context.Context, before time.Time, limit int64) ([]string, error)
+	IsRecent(ctx context.Context, path string) (bool, error)
+	EvictPrefix(ctx context.Context, prefix string) error
 }
 
 type noopRapidCacheIndex struct{}
@@ -28,6 +30,8 @@
 func (noopRapidCacheIndex) Candidates(context.Context, time.Time, int64) ([]string, error) {
 	return nil, nil
 }
+func (noopRapidCacheIndex) IsRecent(context.Context, string) (bool, error) { return false, nil }
+func (noopRapidCacheIndex) EvictPrefix(context.Context, string) error      { return nil }
 
 type redisRapidCacheIndex struct {
 	redis redis.UniversalClient
@@ -110,6 +114,61 @@
 	}).Result()
 }
 
+func (i *redisRapidCacheIndex) IsRecent(ctx context.Context, path string) (bool, error) {
+	score := i.redis.ZScore(ctx, i.keys.chunks, path)
+	if score.Err() == redis.Nil {
+		return false, nil
+	}
+	if score.Err() != nil {
+		return false, score.Err()
+	}
+
+	return true, nil
+}
+
+func (i *redisRapidCacheIndex) EvictPrefix(ctx context.Context, prefix string) error {
+	script := redis.NewScript(`
+local cursor = "0"
+local prefix = ARGV[1]
+local chunks_key = KEYS[1]
+local build_bytes_key = KEYS[2]
+local build_chunks_key = KEYS[3]
+
+repeat
+  local scan_result = redis.call('ZSCAN', chunks_key, cursor, 'MATCH', prefix .. '*', 'COUNT', 100)
+  cursor = scan_result[1]
+  local members = scan_result[2]
+  
+  for i = 1, #members, 2 do
+    local path = members[i]
+    redis.call('ZREM', chunks_key, path)
+    
+    local path_without_prefix = string.gsub(path, '^rapid%-cache/', '')
+    local build_id = string.match(path_without_prefix, '^([^/]+)')
+    if build_id then
+      local obj_result = redis.call('HGET', build_bytes_key, build_id)
+      if obj_result then
+        redis.call('HINCRBY', build_chunks_key, build_id, -1)
+        local chunks_left = redis.call('HGET', build_chunks_key, build_id)
+        if tonumber(chunks_left) <= 0 then
+          redis.call('HDEL', build_bytes_key, build_id)
+          redis.call('HDEL', build_chunks_key, build_id)
+        end
+      end
+    end
+  end
+until cursor == "0"
+
+return 0
+`)
+
+	return script.Run(ctx, i.redis, []string{
+		i.keys.chunks,
+		i.keys.buildBytes,
+		i.keys.buildChunks,
+	}, "rapid-cache/"+prefix).Err()
+}
+
 func rapidCacheBuildID(path string) string {
 	path = strings.TrimPrefix(path, rapidCachePrefix)
 	buildID, _ := SplitPath(path)

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Comment thread packages/shared/pkg/storage/storage_cache_rapid.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Evict zero size skews Redis stats
    • Added size > 0 check in Evict to prevent decrementing build counters when evicting ghost entries with zero size.
  • ✅ Fixed: Bucket deletes skip Redis eviction
    • Added index.Evict call in cleanFromBucket after GCS deletion to synchronize Redis index with bucket state.

Create PR

Or push these changes by commenting:

@cursor push cffa73b102
Preview (cffa73b102)
diff --git a/packages/orchestrator/cmd/clean-rapid-cache/main.go b/packages/orchestrator/cmd/clean-rapid-cache/main.go
--- a/packages/orchestrator/cmd/clean-rapid-cache/main.go
+++ b/packages/orchestrator/cmd/clean-rapid-cache/main.go
@@ -79,7 +79,7 @@
 		return nil
 	}
 
-	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, maxDeletions, dryRun)
+	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, maxDeletions, dryRun, index)
 }
 
 func cleanFromIndex(ctx context.Context, client *gcs.Client, bucket string, cutoff time.Time, maxDeletions int, dryRun bool, index storage.RapidCacheIndex) (int, error) {
@@ -117,7 +117,7 @@
 	return deleted, nil
 }
 
-func cleanFromBucket(ctx context.Context, client *gcs.Client, bucket string, prefix string, cutoff time.Time, maxDeletions int, dryRun bool) error {
+func cleanFromBucket(ctx context.Context, client *gcs.Client, bucket string, prefix string, cutoff time.Time, maxDeletions int, dryRun bool, index storage.RapidCacheIndex) error {
 	var scanned, matched, deleted int
 	objects := client.Bucket(bucket).Objects(ctx, &gcs.Query{Prefix: prefix})
 	for {
@@ -145,6 +145,7 @@
 		if err := client.Bucket(bucket).Object(attrs.Name).Delete(ctx); err != nil {
 			return fmt.Errorf("delete cache object: %w", err)
 		}
+		_ = index.Evict(ctx, attrs.Name, attrs.Size)
 		deleted++
 	}
 

diff --git a/packages/shared/pkg/storage/storage_cache_rapid_index.go b/packages/shared/pkg/storage/storage_cache_rapid_index.go
--- a/packages/shared/pkg/storage/storage_cache_rapid_index.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid_index.go
@@ -92,7 +92,7 @@
 	buildID := rapidCacheBuildID(path)
 	pipe := i.redis.Pipeline()
 	pipe.ZRem(ctx, i.keys.chunks, path)
-	if buildID != "" {
+	if buildID != "" && size > 0 {
 		pipe.HIncrBy(ctx, i.keys.buildBytes, buildID, -size)
 		pipe.HIncrBy(ctx, i.keys.buildChunks, buildID, -1)
 	}

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Comment thread packages/shared/pkg/storage/storage_cache_rapid_index.go
Avoid deferred cleanup before fatal exit and keep Redis cleanup fallback explicit.
Use Redis touch scores to avoid deleting recently used chunks during bucket fallback cleanup and clean stale Redis entries after deletes.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Rapid cache delete errors ignored
    • Changed DeleteObjectsWithPrefix to return cache deletion errors instead of discarding them, ensuring both cache and inner deletion failures are properly propagated.
  • ✅ Fixed: LastAccess Redis errors ignored
    • Modified cleanFromBucket to check and return errors from LastAccess, preventing Redis failures from being treated as missing index entries that could lead to incorrect deletions.

Create PR

Or push these changes by commenting:

@cursor push d8236c266b

You can send follow-ups to the cloud agent here.

Comment thread packages/shared/pkg/storage/storage_cache_rapid.go Outdated
Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Apply gofumpt formatting to the rapid cache index helper.
Surface rapid prefix delete failures and avoid bucket fallback deletes when Redis recency cannot be checked.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Redis error skips bucket cleanup
    • Changed cleanFromBucket to fall back to GCS timestamp when Redis LastAccess fails, treating errors the same as missing keys.
  • ✅ Fixed: Index deletes skip access recheck
    • Added LastAccess recheck in cleanFromIndex before deletion to prevent removing objects that were touched after Candidates snapshot.
  • ✅ Fixed: Index cleanup error skips bucket scan
    • Changed clean function to log cleanFromIndex errors and continue to cleanFromBucket instead of returning immediately.

Create PR

Or push these changes by commenting:

@cursor push b58d31345b
Preview (b58d31345b)
diff --git a/packages/orchestrator/cmd/clean-rapid-cache/main.go b/packages/orchestrator/cmd/clean-rapid-cache/main.go
--- a/packages/orchestrator/cmd/clean-rapid-cache/main.go
+++ b/packages/orchestrator/cmd/clean-rapid-cache/main.go
@@ -73,7 +73,7 @@
 
 	deleted, err := cleanFromIndex(ctx, client, bucket, cutoff, maxDeletions, dryRun, index)
 	if err != nil {
-		return err
+		log.Printf("cleanFromIndex failed (continuing with bucket scan): %v", err)
 	}
 
 	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, maxDeletions-deleted, dryRun, index)
@@ -99,6 +99,10 @@
 		if err != nil {
 			return deleted, fmt.Errorf("read cache object metadata: %w", err)
 		}
+		lastAccess, ok, err := index.LastAccess(ctx, path)
+		if err == nil && ok && lastAccess >= cutoff.Unix() {
+			continue
+		}
 		if dryRun {
 			deleted++
 
@@ -135,12 +139,9 @@
 			continue
 		}
 		lastAccess, ok, err := index.LastAccess(ctx, attrs.Name)
-		if err != nil {
+		if err == nil && ok && lastAccess >= cutoff.Unix() {
 			continue
 		}
-		if ok && lastAccess >= cutoff.Unix() {
-			continue
-		}
 		matched++
 		if deleted >= maxDeletions {
 			break

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Make Redis index cleanup best-effort, recheck recency before index deletes, and let bucket fallback proceed when Redis is unavailable.
@ValentaTomas
Copy link
Copy Markdown
Member Author

Closing until the zonal bucket is available in specific regions.

@ValentaTomas ValentaTomas reopened this Jun 1, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Redis failure hides cleanup protection
    • Changed newRapidIndex to fail fast with log.Fatalf when Redis client creation fails instead of silently returning a noop index.
  • ✅ Fixed: Admit failure after cache Put
    • Changed writeCache to return the error from index.Admit instead of discarding it to ensure index counters remain consistent.
  • ✅ Fixed: Candidates Redis errors ignored
    • Separated error and empty candidate checks in cleanFromIndex and added logging when Candidates returns an error.

Create PR

Or push these changes by commenting:

@cursor push 4d0bcd0b7f
Preview (4d0bcd0b7f)
diff --git a/packages/orchestrator/cmd/clean-rapid-cache/main.go b/packages/orchestrator/cmd/clean-rapid-cache/main.go
--- a/packages/orchestrator/cmd/clean-rapid-cache/main.go
+++ b/packages/orchestrator/cmd/clean-rapid-cache/main.go
@@ -78,9 +78,13 @@
 
 func cleanFromIndex(ctx context.Context, client *gcs.Client, bucket string, cutoff time.Time, maxDeletions int, dryRun bool, index storage.RapidCacheIndex) int {
 	candidates, err := index.Candidates(ctx, cutoff, int64(maxDeletions))
-	if err != nil || len(candidates) == 0 {
+	if err != nil {
+		log.Printf("error fetching candidates from index: %v", err)
 		return 0
 	}
+	if len(candidates) == 0 {
+		return 0
+	}
 
 	deleted := 0
 	for _, path := range candidates {
@@ -167,7 +171,7 @@
 		RedisTLSCABase64: os.Getenv("REDIS_TLS_CA_BASE64"),
 	})
 	if err != nil {
-		return storage.NoopRapidCacheIndex(), func() {}
+		log.Fatalf("failed to create Redis client: %v", err)
 	}
 
 	return storage.NewRedisRapidCacheIndex(redisClient, bucket), func() {

diff --git a/packages/shared/pkg/storage/storage_cache_rapid.go b/packages/shared/pkg/storage/storage_cache_rapid.go
--- a/packages/shared/pkg/storage/storage_cache_rapid.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid.go
@@ -176,7 +176,9 @@
 		return err
 	}
 
-	_ = c.index.Admit(ctx, path, int64(len(data)))
+	if err := c.index.Admit(ctx, path, int64(len(data))); err != nil {
+		return err
+	}
 
 	return nil
 }

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated

_ = c.index.Admit(ctx, path, int64(len(data)))

return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admit failure after cache Put

Medium Severity

writeCache returns success after Put even when Admit fails, because the Redis error is discarded. The rapid bucket holds the chunk but the index counters and chunk score may be wrong, so cleanup can skew build stats or later Evict can drive negative Redis hash counts.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b61e373. Configure here.

Drop the operational machinery (periodic cleanup Nomad job, clean-rapid-cache
binary, Makefile/Dockerfile, dedicated Terraform vars) from this PR so it only
adds the flag-gated read-through cache that plugs in where the NFS cache does.
The feature stays enable-able via the existing orchestrator_env_vars map; the
cleanup job lands in a follow-up. Also only wire the Redis index when the bucket
is configured so startup is inert by default.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Touch before Admit skews counters
    • Fixed by changing the Lua script to use ZSCORE to check membership atomically before modifications, ensuring counters increment correctly even when Touch runs before Admit.

Create PR

Or push these changes by commenting:

@cursor push 3e9428e4c2
Preview (3e9428e4c2)
diff --git a/packages/shared/pkg/storage/storage_cache_rapid_index.go b/packages/shared/pkg/storage/storage_cache_rapid_index.go
--- a/packages/shared/pkg/storage/storage_cache_rapid_index.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid_index.go
@@ -136,12 +136,12 @@
 }
 
 var rapidCacheAdmitScript = redis.NewScript(`
-local added = redis.call('ZADD', KEYS[1], 'NX', ARGV[1], ARGV[2])
+local existed = redis.call('ZSCORE', KEYS[1], ARGV[2]) ~= false
 redis.call('ZADD', KEYS[1], ARGV[1], ARGV[2])
 redis.call('ZADD', KEYS[2], ARGV[1], ARGV[3])
-if added == 1 then
+if not existed then
   redis.call('HINCRBY', KEYS[3], ARGV[3], ARGV[4])
   redis.call('HINCRBY', KEYS[4], ARGV[3], 1)
 end
-return added
+return existed and 0 or 1
 `)

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 334818f. Configure here.

if added == 1 then
redis.call('HINCRBY', KEYS[3], ARGV[3], ARGV[4])
redis.call('HINCRBY', KEYS[4], ARGV[3], 1)
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touch before Admit skews counters

Medium Severity

After a chunk is written to the rapid bucket, another reader can hit that object and run async Touch, which ZADDs the chunk path before the write-back goroutine runs Admit. The admit script treats an existing member as a duplicate and skips HINCRBY on build_bytes/build_chunks, so Redis cleanup aggregates stay wrong even though the object is cached.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 334818f. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant